Problem Statement and Metrics
Learn about the problem statement and metrics for building an Ad click prediction machine learning system.
- Let’s understand the ad serving background before moving forward. The ad request goes through a waterfall model where publishers try to sell its inventory through direct sales with high CPM (Cost Per Million). If it is unable to do so, the publishers pass the impression to other networks until it is sold.
2. Metrics design and requirements#
Metrics#
During the training phase, we can focus on machine learning metrics instead of revenue metrics or CTR metrics. Below are the two metrics:
Offline metrics#
- Normalized Cross-Entropy (NCE): NCE is the predictive
logloss
divided by thecross-entropy
of the background CTR. This way NCE is insensitive to background CTR. This is the NCE formula:
Online metrics#
- Revenue Lift: Percentage of revenue changes over a period of time. Upon deployment, a new model is deployed on a small percentage of traffic. The key decision is to balance between percentage traffic and the duration of the A/B testing phase.
Requirements#
Training#
-
Imbalance data: The Click Through Rate (CTR) is very small in practice (1%-2%), which makes supervised training difficult. We need a way to train the model that can handle highly imbalanced data.
-
Retraining frequency: The ability to retrain models many times within one day to capture the data distribution shift in the production environment.
-
Train/validation data split: To simulate a production system, the training data and validation data is partitioned by time.
Inference#
-
Serving: Low latency (50ms - 100ms) for ad prediction.
-
Latency: Ad requests go through a waterfall model, therefore, recommendation latency for ML model needs to be fast.
-
Overspent: If the ad serving model repeatedly serves the same ads, it might end up over-spending the campaign budget and publishers lose money.
Summary#
Type | Desired goals |
---|---|
Metrics | Reasonable normalized cross-entropy and click through rate |
Training | Ability to handle imbalance data |
High throughput with the ability to retrain many times per day | |
Inference | Latency from 50 to 100ms |
Ability to control or avoid overspent campaign budget while serving ads |
Quiz on loss function
For the loss function, why can’t we use the logloss function?
A)
It’s difficult to interpret when compared with accuracy, precision, and recall metrics.
B)
Logloss is only applicable for binary classification and can be sensitive to background CTR.